Goto

Collaborating Authors

 conversational assistant


User Misconceptions of LLM-Based Conversational Programming Assistants

arXiv.org Artificial Intelligence

Programming assistants powered by large language models (LLMs) have become widely available, with conversational assistants like ChatGPT proving particularly accessible to less experienced programmers. However, the varied capabilities of these tools across model versions and the mixed availability of extensions that enable web search, code execution, or retrieval-augmented generation create opportunities for user misconceptions about what systems can and cannot do. Such misconceptions may lead to over-reliance, unproductive practices, or insufficient quality control in LLM-assisted programming. Here, we aim to characterize misconceptions that users of conversational LLM-based assistants may have in programming contexts. Using a two-phase approach, we first brainstorm and catalog user misconceptions that may occur, and then conduct a qualitative analysis to examine whether these conceptual issues surface in naturalistic Python-programming conversations with an LLM-based chatbot drawn from an openly available dataset. Indeed, we see evidence that some users have misplaced expectations about the availability of LLM-based chatbot features like web access, code execution, or non-text output generation. We also see potential evidence for deeper conceptual issues around the scope of information required to debug, validate, and optimize programs. Our findings reinforce the need for designing LLM-based tools that more clearly communicate their programming capabilities to users.


OpenAI and Google are launching supercharged AI assistants. Here's how you can try them out.

MIT Technology Review

On Tuesday, Google announced its own new tools, including a conversational assistant called Gemini Live, which can do many of the same things. It also revealed that it's building a sort of "do-everything" AI agent, which is currently in development but will not be released until later this year. Soon you'll be able to explore for yourself to gauge whether you'll turn to these tools in your daily routine as much as their makers hope, or whether they're more like a sci-fi party trick that eventually loses its charm. Here's what you should know about how to access these new tools, what you might use them for, and how much it will cost. What it's capable of: The model can talk with you in real time, with a response delay of about 320 milliseconds, which OpenAI says is on par with natural human conversation.


HumanRankEval: Automatic Evaluation of LMs as Conversational Assistants

arXiv.org Artificial Intelligence

Language models (LMs) as conversational assistants recently became popular tools that help people accomplish a variety of tasks. These typically result from adapting LMs pretrained on general domain text sequences through further instruction-tuning and possibly preference optimisation methods. The evaluation of such LMs would ideally be performed using human judgement, however, this is not scalable. On the other hand, automatic evaluation featuring auxiliary LMs as judges and/or knowledge-based tasks is scalable but struggles with assessing conversational ability and adherence to instructions. To help accelerate the development of LMs as conversational assistants, we propose a novel automatic evaluation task: HumanRankEval (HRE). It consists of a large-scale, diverse and high-quality set of questions, each with several answers authored and scored by humans. To perform evaluation, HRE ranks these answers based on their log-likelihood under the LM's distribution, and subsequently calculates their correlation with the corresponding human rankings. We support HRE's efficacy by investigating how efficiently it separates pretrained and instruction-tuned LMs of various sizes. We show that HRE correlates well with human judgements and is particularly responsive to model changes following instruction-tuning.


Meta confesses it's using what you post to train its AI

FOX News

"CyberGuy" explains how Meta is admitting to using user data to train its AI. How would you feel if your social media posts were used to train a virtual assistant without your consent? That is exactly what is happening to millions of people who belong to Facebook and Instagram. Meta, the parent company of Facebook, admits that it is using public posts from both Instagram and Facebook members to train its new artificial intelligence assistant, Meta AI. CLICK TO GET KURT'S FREE CYBERGUY NEWSLETTER WITH SECURITY ALERTS, QUICK VIDEO TIPS, TECH REVIEWS, AND EASY HOW-TO'S TO MAKE YOU SMARTER Meta admits to using your posts to train its AI.


Grounded Complex Task Segmentation for Conversational Assistants

arXiv.org Artificial Intelligence

Following complex instructions in conversational assistants can be quite daunting due to the shorter attention and memory spans when compared to reading the same instructions. Hence, when conversational assistants walk users through the steps of complex tasks, there is a need to structure the task into manageable pieces of information of the right length and complexity. In this paper, we tackle the recipes domain and convert reading structured instructions into conversational structured ones. We annotated the structure of instructions according to a conversational scenario, which provided insights into what is expected in this setting. To computationally model the conversational step's characteristics, we tested various Transformer-based architectures, showing that a token-based approach delivers the best results. A further user study showed that users tend to favor steps of manageable complexity and length, and that the proposed methodology can improve the original web-based instructional text. Specifically, 86% of the evaluated tasks were improved from a conversational suitability point of view.


"Alexa doesn't have that many feelings": Children's understanding of AI through interactions with smart speakers in their homes

arXiv.org Artificial Intelligence

As voice-based Conversational Assistants (CAs), including Alexa, Siri, Google Home, have become commonly embedded in households, many children now routinely interact with Artificial Intelligence (AI) systems. It is important to research children's experiences with consumer devices which use AI techniques because these shape their understanding of AI and its capabilities. We conducted a mixed-methods study (questionnaires and interviews) with primary-school children aged 6-11 in Scotland to establish children's understanding of how voice-based CAs work, how they perceive their cognitive abilities, agency and other human-like qualities, their awareness and trust of privacy aspects when using CAs and what they perceive as appropriate verbal interactions with CAs. Most children overestimated the CAs' intelligence and were uncertain about the systems' feelings or agency. They also lacked accurate understanding of data privacy and security aspects, and believed it was wrong to be rude to conversational assistants. Exploring children's current understanding of AI-supported technology has educational implications; such findings will enable educators to develop appropriate materials to address the pressing need for AI literacy.


Collaboration with Conversational AI Assistants for UX Evaluation: Questions and How to Ask them (Voice vs. Text)

arXiv.org Artificial Intelligence

AI is promising in assisting UX evaluators with analyzing usability tests, but its judgments are typically presented as non-interactive visualizations. Evaluators may have questions about test recordings, but have no way of asking them. Interactive conversational assistants provide a Q&A dynamic that may improve analysis efficiency and evaluator autonomy. To understand the full range of analysis-related questions, we conducted a Wizard-of-Oz design probe study with 20 participants who interacted with simulated AI assistants via text or voice. We found that participants asked for five categories of information: user actions, user mental model, help from the AI assistant, product and task information, and user demographics. Those who used the text assistant asked more questions, but the question lengths were similar. The text assistant was perceived as significantly more efficient, but both were rated equally in satisfaction and trust. We also provide design considerations for future conversational AI assistants for UX evaluation.


Ethics And Conversational Assistants

#artificialintelligence

It is utopian to rule out any form of anthropomorphism when addressing a conversational assistant because of the use of language as a vector of exchange. Designers, therefore, must limit these shortcomings with the implementation of these design rules, thus reducing the risks of deception and dependency, and giving confidence in these systems.


Who Is Responsible for Ethical AI?

#artificialintelligence

Fair and Unbiased: We educate and teach our kids every day to be fair and unbiased towards people in our society. How do we make sure that a similar type of guidance is adopted by AI? Think of a recruiting app using AI to parse and sort through candidates. Any data set might contain inherent biases that an AI model might reinforce to discriminate against some candidates. It is our job to ensure this type of AI model is designed to operate around these biases. Even more than that, AI models should be designed to detect bias and errors in blind spots to make the outcome fairer.


Error Detection in Large-Scale Natural Language Understanding Systems Using Transformer Models

arXiv.org Artificial Intelligence

Large-scale conversational assistants like Alexa, Siri, Cortana and Google Assistant process every utterance using multiple models for domain, intent and named entity recognition. Given the decoupled nature of model development and large traffic volumes, it is extremely difficult to identify utterances processed erroneously by such systems. We address this challenge to detect domain classification errors using offline Transformer models. We combine utterance encodings from a RoBERTa model with the Nbest hypothesis produced by the production system. We then fine-tune end-to-end in a multitask setting using a small dataset of humanannotated utterances with domain classification errors. We tested our approach for detecting misclassifications from one domain that accounts for <0.5% of the traffic in a large-scale conversational AI system. Our approach achieves an F1 score of 30% outperforming a bi- LSTM baseline by 16.9% and a standalone RoBERTa model by 4.8%. We improve this further by 2.2% to 32.2% by ensembling multiple models.